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The uniform distribution on matrices with specified row and col- 
umn sums is often a natural choice of null model when testing for 
structure in two-way tables (binary or nonnegative integer). Due to 
the difficulty of sampling from this distribution, many approximate 
methods have been developed. We will show that by exploiting certain 
symmetries, exact sampling and counting is in fact possible in many 
nontrivial real-world cases. We illustrate with real datasets including 
ecological co-occurrence matrices, social networks, and contingency 
tables, and we apply our method to finding the Ehrhart polynomials 
of the Birkhoff polytope. Further, we demonstrate how the method 
can be used to assess the accuracy of certain approximate samplers, 
by finding the total variation distance to the uniform distribution. 



1. A motivating example. In ecology, co-occurrence tables are used 
to summarize biogeographical data. For instance, Table 1 indicates the pres- 
ence/absence of 26 mammalian species in 28 mountain ranges in the Amer- 
ican Southwest. When presented with such data, one might wonder: What 
factors control which species live in which habitats? In 1975, ecologist (and 
now, renowned author) Jared Diamond stunned the ecology community with 
the proposal of specific "assembly rules" governing the allocation of species 
to habitats. Diamond (1975) observed that certain pairs of species tended 
to occur together, and other pairs tended to be disjoint, suggesting that 
cooperation and competition play a key role. But did these patterns really 
reflect species interactions, or were they merely due to random chance? 

To address this question, Connor and Simberloff (1979) suggested statis- 
tical hypothesis testing. Since some species are simply more prolific than 
others, and some habitats are larger than others, a sensible choice of null 
model is the uniform distribution on co-occurrence matrices with the ob- 
served numbers of habitats per species (row sums) and species per habitat 
(column sums). Connor and Simberloff (1979) presented a formidable chal- 
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Table 1 

26 mammalian species in 28 mountain ranges (Patterson and Atmar, 1986) 
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lenge to Diamond's theory by showing that under this simple null model, 
the statistics observed by Diamond could easily have arisen by chance. A 
contentious debate erupted, yielding an extensive body of research on test 
statistics and null models for detecting various types of "structure" in ecolog- 
ical matrices. Decades later, the basic null model of Connor and Simberloff 
has withstood the test of time, and is now a mainstay in the analysis of 
ecological matrices (see e.g. Ulrich and Gotelli (2007); Gotelli and McCabe 
(2002), and references therein). 

As a concrete example, consider the montane mammals in Table 1. Pat- 
terson and Atmar (1986) proposed a model in which, during the most re- 
cent glacial period, cold-adapted species inhabited a region spanning sev- 
eral mountain ranges and the low-lying areas between, but in the current 
(warmer) interglacial period these populations have receded into the moun- 
tains and become extinct in some areas. They suggest that this would cause 
the set of species found in one mountain range to tend to be a subset of 
those found in another. This led them to consider the following nested subset 
statistic, equal to the number of species-habitat pairs such that the species 
does not occur in that habitat but does occur in a less-populated habitat: 



Snest = ^I( a ij = 0) Qj > m i)> 
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where A = (a^) is a binary matrix with species as rows and habitats as 
columns (such as Table 1), qj = ^i a «j> an< ^ m i = rnin{(/j : a y - = 1, j = 

I, . . . , n}. Here, -/"(£?) is 1 if E 1 is true, and otherwise. (Note: Smaller S nes t 
means more "nestedness".) To perform a hypothesis test using the standard 
null model described above, one would estimate the p- value for S nes t by sam- 
pling from the uniform distribution over binary matrices with the observed 
row and column sums — in this case, (26, 26, 25, 22, 22, 18, 12, 12, 12, 11, 
10, 10, 8, 8, 8, 7, 6, 6, 5, 5, 4, 4, 3, 3, 1, 1) and (26, 24, 23, 21, 19, 13, 13, 12, 

II, 10, 10, 9, 9, 7, 7, 7, 7, 7, 7, 6, 6, 5, 5, 4, 3, 2, 1, 1), respectively. However, 
it is difficult to sample exactly from this distribution. Instead, Patterson and 
Atmar used an approximation in which the entries of each column are drawn 
proportionally to the row sums, conditioned on the column sum. (The row 
sums are not constrained in their approximation.) They drew 1000 samples 
from their approximation, estimated the p-value of S nes t to be 9 x 10 -20 
for Table 1, and concluded that the data does exhibit significantly more 
nestedness than one would expect under the null. Patterson and Atmar's 
(1986) article was highly influential, inspiring many subsequent studies into 
nestedness (see e.g. Ulrich and Gotelli (2007) and references therein). 

The preceding scenario is commonplace — the combinatorial problem 
that arises from constraining the row and column sums makes it difficult to 
sample exactly from the desired uniform distribution. As a result, on all but 
the most trivial matrices, researchers have resorted to approximate meth- 
ods, such as Markov chain Monte Carlo (MCMC), Sequential Importance 
Sampling, and heuristic approaches such as the one described above. With 
all these approximate methods, the nagging question remains: Did the use 
of an approximate distribution significantly affect the result? 

In this work, we describe an efficient algorithm for sampling exactly from 
the uniform distribution over binary or nonnegative integer matrices with 
given row and column sums (provided that most of the sums are not too 
large). As a result, Monte Carlo estimates of quantities of interest, such as 
p-values, can be accompanied by exact confidence intervals (that is, true 
confidence intervals, rather than intervals based on asymptotic approxima- 
tions). Further, our algorithm computes the exact number of such matrices. 



Table 2 

Sample statistics of Snest for montane mammal data (Table 1) 



Method 


# samples 


estimated p- value 


mean 


std. dev. 
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max 
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Fig 1: Histogram of S nes t for 10 6 exact samples from the uniform distribution 
over matrices with margins as in Table 1. The dashed line is the value of 
S nes t for Table 1. 



In Table 2, we compare the results of Patterson and Atmar with results 
based on 10 6 exact samples using our algorithm. There are large discrepan- 
cies. We estimate the p- value to be 0.0322 ± .00018 (p ± se), with an exact 
95% confidence interval of [0.0318,0.0326] (based on the binomial c.d.f., not 
on se). Their estimate of 9 x 10 -20 is far smaller — in fact, the value of the 
test statistic on the observed matrix, S nes t = 63, is considerably less than 
the smallest value among all of their 1000 samples, S nes t — 180. (Note: It 
appears that they used a normal approximation to the distribution of S nes t 
to estimate the p-value.) Meanwhile, in view of the histogram of exact sam- 
ples in Figure 1, the observed matrix appears relatively typical! Recall that 
in their approximation, the entries of each column are drawn proportionally 
to the row sums, conditioned on the column sum, and that the row sums 
are not constrained. Apparently, omitting the constraint on the row sums 
has a drastic effect. As a result, the analysis dramatically underestimated 
the p-value. This vividly illustrates the utility of exact sampling in these 
problems. 

Our algorithm required 46 seconds to find that there are 2, 663, 296, 694, 
330,271,332,856,672,902,543,209,853,700 (« 2.7 x 10 39 ) binary matrices 
with row and column sums as in Table 1, and subsequently, required 4.2 
milliseconds per exact sample. (All computations reported in this paper 
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were performed using a 2.8 GHz processor with 6 GB of RAM.) We know 
of no other algorithm capable of exact counting and sampling for matrices 
of this size. 

2. Overview. Let N(p, q) be the number of m x n binary matri- 
ces with margins (row and column sums) p = (pi, . . . ,p m ) € N m and 
q = (qi, . . . ,q n ) S N n , respectively, and let M(p,q) be the correspond- 
ing number of N-valued matrices. (We use N = {0, 1, . . . } throughout.) In 
this paper we develop a technique for finding N(p, q) and M(p, q), and for 
exact uniform sampling from these sets of matrices — an important and 
challenging problem (Diaconis and Gangolli, 1995; Chen et al., 2005). Our 
method is feasible for modestly-sized matrices (roughly, m + n < 100 with 
current desktop computing power) or very sparse large matrices. 

As described above, in the binary case, an important application is testing 
for structure in ecological co-occurrence matrices. It turns out that most 
real-world ecology matrices are small enough that our method is feasible. In 
the N-valued case, an important application is the conditional volume test 
of Diaconis and Efron (1985) for two-way contingency tables (for which our 
method is feasible as long as the margins are relatively small). In addition 
to these direct applications, a major auxiliary benefit of having an exact 
method is that it enables one to measure the accuracy of certain approximate 
methods (which can scale to matrices far larger than our exact method can 
accommodate). We will show how to obtain precise estimates of the total 
variation distance between the uniform distribution and an approximate 
distribution, provided that one can sample from and compute probabilities 
under the approximation. 

Since a bipartite graph with degree sequences p = (pi, ■ ■ ■ ,p m ) € N m , 
q = (qi, . . . ,q n ) 6 N n (and m,n vertices in each part, respectively) can be 
viewed asamxn matrix with row and column sums (p, q), our technique ap- 
plies equally well to counting and uniformly sampling such bipartite graphs. 
Under this correspondence, simple graphs correspond to binary matrices, 
and multigraphs correspond to N-valued matrices. 

The distinguishing characteristic of our method is its tractability on ma- 
trices of nontrivial size. In general, computing M(p, q) is #P-complete 
(Dyer, Kannan and Mount, 1997), and perhaps N(p, q) is as well. However, 
if one assumes a bound on the column sums then our algorithm computes 
both numbers in polynomial time. After counting, uniform samples may 
be drawn in polynomial expected time for bounded column sums. To our 
knowledge, all previous algorithms for the non-regular case require super- 
polynomial time (in the worst case) to compute these numbers, even for 
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bounded column sums. (We assume a description length of at least m + n 
and no more than m log a+n log b, where a = max pi, b = max In general 
(without assuming a bound on the column sums), our algorithm computes 
N(p, q) or M(p, q) in O (m(ab + c) (a + b) b ~ 1 (b+c) b ~ 1 (log c) 3 ) time for m x n 
matrices, where a = rnaxp^, b = maxgj, and c = ^2pi = Qi- After count- 
ing, uniform samples may be drawn in O(mclogc) expected time. 

In complement to most approaches to exactly computing M(p, q), which 
are efficient for small tables with large margins, our algorithm is efficient for 
large tables with small margins. For instance, computing Af(p,q) for the 
100 x 100 matrices with p = q = (5( 20 \ 4^°\ 3^ , 2^ , l^), where x^ 
denotes x repeated k times, takes 701 seconds (see Appendix for the exact 
number, which is approximately 2.96 x 10 434 ). Likewise, computing iV(p, q) 
for the same p and q takes 688 seconds (see Appendix for the exact number, 
approximately 2.35 x 10 ). 

The remainder of the paper is organized as follows. In the rest of this 
section, we describe related work from the literature. In Section 3, we first 
describe the intuition behind our technique, then formally state the recur- 
sions and the resulting algorithm, and give bounds on computation time. In 
Section 4, we illustrate a variety of applications. In Section 5 we prove the 
recursions, and in Section 6, we prove the bounds on computation time. 

2.1. Previous work. We briefly survey the previous work on this problem. 
This review is not exhaustive, focusing instead on those results which are 
particularly significant or closely related to the present work. Let H n (r) and 
H*(r) denote M(p, q) and N(p, q), respectively, when p = q = (r, . . . , r) € 
N n . The predominant focus has been on the regular cases H n (r) and H*(r). 

Work on counting these matrices goes back at least as far as MacMa- 
hon (1915, see Vol II, p. 161), who applied his expansive theory to find 
the polynomial for Hs(r). Redfield's theorem (Redfield, 1927), inspired by 
MacMahon, can be used to derive summations for some special cases, such as 
H n {r), H*(r) for r = 2, 3, and in similar work, Read (1959; 1960) used Polya 
theory to derive these summations for r = 3. Two beautiful theoretical re- 
sults must also be mentioned: Stanley (1973) proved that for fixed n, H n (r) 
is a polynomial in r, and Gessel (1987; 1990) showed that for fixed r, both 
H n {r) and H*(r) are P-recursive in n, vastly generalizing the recursions for 
H n (2), H*(2) found by Anand, Dumir and Gupta (1966). 

We turn next to algorithmic results more closely related to the present 
work. McKay (1983) and Canfield and McKay (2005) have demonstrated a 
coefficient extraction technique for computing iV(p, q) in the semi-regular 
case (in which p = (a, . . . , a) € N m and q = (&,...,&) 6 N n ). To our knowl- 
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edge, McKay's is the most efficient method known previously for iV(p,q). 
By our analysis it requires at least Q(mn b ) time for bounded a, b, while 
the method presented here is 0(mn b (log re) 3 ) in this case. Since this latter 
bound is quite crude, we expect that our method should have comparable or 
better performance, and indeed empirically we find that typically it is more 
efficient. If only b is bounded, McKay's algorithm is still 0(mn fe ), but the 
bound on our performance increases to 0(rren 2b_1 (log n) 3 ), so it is possible 
that McKay's algorithm will outperform ours in these cases. Nonetheless, 
it is important to bear in mind that McKay's algorithm is efficient only in 
the semi-regular case (while our method permits non-regular margins). If 
neither a nor b is bounded, McKay's method is exponential in b (as is ours). 

McKay and Wormald (1990) presented an intriguing randomized algo- 
rithm for exact uniform sampling from the set of binary matrices with mar- 
gins p, q, provided that all of the margins are sufficiently small. It takes 
0((m + n) 2 fc 4 ) expected time per sample, where k is an upper bound on 
all of the margins. Unfortunately, the conditions under which it applies are 
rather restrictive — in fact, for most of the real- world problems we have 
encountered, it reduces to a simple rejection sampling scheme that is rather 
inefficient. 

Regarding M(p, q), one of the most efficient algorithms known to date is 
LattE (Lattice point Enumeration) (De Loera et al., 2004), which uses the 
algorithm of Barvinok (1994) to count lattice points contained in convex 
polyhedra. It runs in polynomial time for any fixed dimension, and as a 
result it can compute M(p, q) for astoundingly large margins, provided that 
m and re are small. However, since the computation time grows very quickly 
with the dimension, LattE is currently inapplicable when m and re are larger 
than 6. There are similar algorithms (Mount, 2000; De Loera and Sturmfels, 
2003; Beck and Pixton, 2003) that are efficient for small matrices. 

In addition, several other algorithms have been presented for finding 
N(p, q) (such as Johnsen and Straume, 1987; Wang, 1988; Wang and Zhang, 
1998; Perez- Salvador et al., 2002) and M(p,q) (see review by Diaconis and 
Gangolli, 1995) allowing non-regular margins, however, it appears that all 
are exponential in the size of the matrix, even for bounded margins. While in 
this work we are concerned solely with exact results, we note that many use- 
ful approximations for N(p, q) and M(p, q) (in the general case) have been 
found, as well as approximate sampling algorithms (Holmes and Jones, 1996; 
Chen et al., 2005; Greenhill, McKay and Wang, 2006; Canfield, Greenhill 
and McKay, 2008; Harrison and Miller, 2013). 
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3. Main results. 



3.1. Idea of the algorithm. Before formally presenting the results, we 
introduce the simple observation underlying our approach. The basic idea is 
very straightforward — the surprising part is that it leads to an algorithm 
that is efficient for many nontrivial datasets. 

For simplicity, consider the binary case. The first thing to notice 
is that N(p, q) is unchanged by permutations of the entries of p or 
q (as is easy to see). Now, suppose p = (6,6,4,3,1,1,1) and q = 
(0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3), and consider the partially-filled matrix in Ta- 
ble 3 in which the first row is u = (0, 0, 1, 1,0, 1, 0, 1, 0, 1,0, 1,0) and the rest 
of the matrix is undetermined. 



Table 3 
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There are iV(p,q) binary matrices with margins (p,q), and there are 
clearly N(Lp, q— u) of these with u as the first row, where L denotes the left- 
shift map: Lp = (p2, ■ ■ ■ ,p m )- Divide the first row into blocks 0, 1, 2, . . . , m 
where block k contains the columns i such that qi = k. Since the number of 
matrices is unchanged under permutations of the margins, then N(Lp, q — 
u) = N(Lp, q — v) for any permutation v of u such that the number of ones 
in each block is unchanged. If is the size of block k, and Sk is the number 
of ones in block k, then there are 




such permutations v, where r = (ri,...,r m ) and s = (si, . . . ,s m ). For 
instance, in this example, ( r s ) = (3) (^) (2). Note that so will be 0, and thus 
will be 1, for any u such that N(Lp, q — u) 7^ 0. Therefore, 

N(p,q)= Yl ^P,q-u) = WjiV(Lp,q-u s ) 

u6{0,l} n s ^ ' 

where the second sum is over nonnegative integer vectors s = (s\, . . . , s m ) 
summing to p±, and u s £ {0, l} n is any binary vector with ones in block 
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k for each k = 1, ... , m, and zero ones in block 0. This defines a recursion 
for N(p, q) which, when carefully implemented using dynamic program- 
ming and the Gale-Ryser criterion (described below), is the basis for our 
algorithm. This computation yields a data structure that enables efficient 
sampling in a row- by-row fashion. There is a similar (although more subtle) 
recursion for the case of M(p, q). 

3.2. Recursions, algorithms, and bounds. Introducing the following no- 
tation will be useful. We consider N n to be the subset of N°° := {(ri, V2, . . . ) : 
r,j G N for i = 1, 2, . . . } such that all but the first n components are zero. Let 
L : N°° ->• N°° denote the left-shift map: Lr = (r 2 , r 3 , . . . ). Given r, s G N°°, 
let r\s := r — s + Ls, (which may be read as "r reduce s"), and let f denote 
the vector of counts, f := (fi,f2, • • • ) where fj := #{j : rj = i}. We write 
r < s if n < Si for all i. Given n G N, let C n (k) := {r E N" : £V n = k} 
be the n-part compositions (including zeros) of k, and given s G N n , let 
C s (k) := {r G C n (k) : r < s}. Since N(p, q) and M(p,q) are fixed un- 
der permutations of the row sums p or column sums q, and since zeros 
in the margins do not affect the number of matrices, then we may define 
^(PjQ) := -W(p>ci) an d M(p,q) := M(p, q) without ambiguity. 

Theorem 3.1 (Recursions). The number of matrices with margins 
(p, q) G N m x N n is given by 

(1) N(p, r) = ^2 ( ^-^(^PirV 8 ) for binary matrices, and 

(r -(- Ls\ — 
| M(Lp, r\s) /or N-valued matrices, 

where r = q, and in (2), we sum over all s suc/i that s G C r+is (pi). 

For the proof, see Section 5. In the binary case, computation of this 
sum is greatly simplified by summing only over those s G C r {p\) for which 
N(Lp, r\s) is nonzero. This can be efficiently achieved using the Gale-Ryser 
criterion (Gale, 1957; Ryser, 1957), which provides the following necessary 
and sufficient condition for the existence of a binary matrix with margins 
(p, q): when q[ := #{j : qj > i} and pi > ■ ■ ■ > p m -, we have N(p, q) ^ 

if and only if Yn=iPi - Yl=xl'% for a11 3 < m and YaLiP* = YT=i^i- 
This is easily translated into a condition in terms of (p,q) and iV(p,q). In 
the N-valued case, there is no analogue to the Gale-Ryser criterion (since 
M(p, q) > for any nonnegative p, q such that YliPi = Qj)- The follow- 
ing recursive procedure can be used to compute either iV(p, q) or M(p, q). 
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Algorithm 3.2 (Counting). 
Input: (p,q), where (p,q) € N m x N n are margins such that ^^Pi = Ylili- 
Output: N(p, q) (or M(p, q) ), the number of binary (or N-valued) matrices. 
Storage: Lookup table initialized with N(Q, 0) = 1 (or M(0, 0) = 1). 

(1) If N(p, q) ('orM(p,q)J is in £/ie lookup table, return the result. 

(2) In the binary case, if Gale-Ryser gives N(p, q) = 0, store the result 
and return 0. 

(3) Evaluate the sum in Theorem 3.1, recursing to step (1) for each term. 

(4) Store the result and return it. 

Let T(p, q) be the time (number of machine operations) required by Al- 
gorithm 3.2 to compute iV(p,q) or M(p, q), after performing an 0(n 3 ) 
preprocessing step to compute all needed binomial coefficients. (It turns out 
that computing M(p, q) always takes longer, but the bounds we prove apply 
to both N(p, q) and M(p, q).) We give a series of bounds on T(p, q) ranging 
from tighter but more complicated, to more crude but simpler. The bounds 
will absorb the 0(n 3 ) pre-computation except in the trivial case when the 
maximum column sum is 1. 

Theorem 3.3 (Bounds). Suppose (p,q) € N m x N n , a = rnaxp;, b = 
m&xqi, and c = J^Pi = Y^Qi- Then 

(1) T (P , q) = o^+cxiog cf ± (*> +_ 6 - ') (» + ' " + 

i=l 

(2) T(p,q) = 0(m{ab + c)(a + b) b - 1 (b + c) b - 1 {logc) 3 ), 

(3) T(p,q) = O^mn^-^logn) 3 ) for bounded b, 

(4) T(p, q) = 0(mn 6 (log n) 3 ) for bounded a,b. 

For the proof, see Section 6. Since we may swap p and q without chang- 
ing the number of matrices, we could use Algorithm 3.2 on (q, p) to com- 
pute N(p, q) or M(p, q) using T(q, p) operations, which, for example, is 
0(nm a (logm) 3 ) for bounded a, b. T(p, q) also depends on the ordering of 
the row sums pi, ■ ■ ■ ,p m as suggested by Theorem 3.3(1), and we find that 
putting them in decreasing order p\ > ■ ■ ■ > p m tends to work well. In 
the binary case, Algorithm 3.2 is typically made significantly more efficient 
by using the Gale-Ryser conditions, and this is not accounted for in these 
bounds. 

It is worth mentioning that a significant further reduction in computation 
time can be achieved by factoring the sums in Theorem 3.1. For example, 
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in the binary case, we use 

*(p^e(;;)e(:) -E(:r)™, 

SI x ' S2 X ' S m 

where for each k = 1, . . . ,m, is summed over a range of values chosen 
so that s will always satisfy both s G C r (p\) and the Gale-Ryser criterion. 
This improvement is also not accounted for in the bounds above. 

Algorithm 3.2 traverses a directed acyclic graph in which each node rep- 
resents a distinct set of input arguments (p, q) to the algorithm. Node (u, v) 
is the child of node (p, q) if the algorithm is called (recursively) with argu- 
ments (u, v) while executing a call with arguments (p, q). We associate with 
each node its count: the number of matrices with the corresponding margins. 
If the initial input arguments are (p, q), then all nodes are descendents of 
node (p,q). Meanwhile, all nodes with positive count are ancestors of node 
(0,0). Note the correspondence between the children of a node (u, v) and 
the compositions s E C v (u±) in the binary case, and s € C v+Ls (ui) in the 
N- valued case, under which s corresponds with the child (Lu, v\s). 

Once the counting is complete, these counts yield an efficient algorithm 
for uniform sampling from the set of (p, q) matrices (binary or N- valued). 
It is straightforward to see that since the counts are exact, the following 
algorithm yields a sample from the uniform distribution. 

Algorithm 3.4 (Sampling). 
Input: 

■ Row and column sums (p,q) € N m x N n such that = J2i Qi- 

■ Lookup table of counts generated by Algorithm 3.2 on input (p,q). 
Output: A uniformly- drawn binary (orN-valued) matrix with margins (p, q). 

(1) Initialize (u, v) (p,q). 

(2) //(u,v) = (0,0), exit. 

(3) Choose a child (Lu, v\s) of (u, v) with probability proportional to its 
count times the number of corresponding rows ( that is, the number of 
rows r G C v (u\) such that v — r = v\s.) 

(4) Choose a row r uniformly among the corresponding rows. 

(5) (u,v) <- (Lu,v-r). 

(6) Go to (2). 

In step (3), there are (Z) corresponding rows r in the binary case (in which 
r G {0,l} n ), and (^ + s Ls ) in the N-valued case. In Section 6, we prove that 
Algorithm 3.4 takes O(mclogc) expected time per sample, where c = 
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A software implementation of the algorithms above has been made avail- 
able, and contains demonstrations of the applications in Section 1 above and 
Section 4 below. 

4. Applications. 

4.1. Co- occurrence matrices in ecology. As described in Section 1, a pri- 
mary application is hypothesis testing for ecological matrices. It turns out 
that many real-world datasets can be accommodated by our method: out 
of 291 ecology matrices in a collection compiled by Atmar and Patterson 
(1995), it could handle 225. 

In Section 1, we compared our approach to a heuristic approximation. 
However, it is now more common for researchers to use MCMC (e.g. Gotelli 
and McCabe, 2002; Ulrich and Gotelli, 2007), and recently, Sequential Im- 
portance Sampling (SIS) approaches have been developed that appear to 
improve upon MCMC (Chen et al, 2005; Harrison and Miller, 2013). 

Table 4 

13 species of finch in 11 of the Galapagos Islands (Chen et al., 2005) 





Habitat 


Species 


01 


02 


03 


01 


05 


06 


07 


08 


09 


10 


11 


12 


13 


14 


15 


16 


IT 


A 








1 


1 


1 


1 


1 


1 


1 


1 





1 


1 


1 


1 


1 


1 


B 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 





1 





1 


1 








C 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 





1 


1 








D 








1 


1 


1 








1 





1 





1 


1 





1 


1 


1 


E 


1 


1 


1 





1 


1 


1 


1 


1 


1 





1 





1 


1 








F 
































1 





1 














G 








1 


1 


1 


1 


1 


1 


1 








1 





1 


1 








H 



































1 

















I 








1 


1 


1 


1 


1 


1 


1 


1 





1 








1 








J 








1 


1 


1 


1 


1 


1 


1 


1 





1 





1 


1 








K 








1 


1 


1 





1 


1 





1 























L 








1 


1 









































M 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 


1 



To compare with these alternatives, we consider a benchmark dataset for 
this application. Table 4 indicates the presence/absence of 13 species of finch 
on 17 of the Galapagos Islands. It comes equipped with the colorful name 
"Darwin's finches" because Charles Darwin's development of the theory of 
evolution was inspired in part by his observations of these birds. The row 
and column sums of the matrix are (14, 13, 14, 10, 12, 2, 10, 1, 10, 11, 6, 
2, 17) and (4, 4, 11, 10, 10, 8, 9, 10, 8, 9, 3, 10, 4, 7, 9, 3, 3), respectively. 
This table is the subject of an analysis by Chen, Diaconis, Holmes, and 
Liu (2005), in which they use MCMC and their SIS algorithm to estimate 



FIXED-MARGIN MATRICES 



13 



the p- value for the S 2 statistic of Roberts and Stone (1990): 

S* 2 = — V r 2 

\2J i<j 

where C = (cy) = AA T and A is an m x n co-occurrence matrix. For given 
margins, larger values of S 2 are interpreted as indicating greater competition 
and cooperation among species (as measured by the variance of the number 
of habitats Cjj shared between a randomly selected pair of species). 

Table 5 

Sample statistics of S 2 for Darwin's finch data ( Table 4 ) 



Method 


# samples 


estimated p-value 


Exact 

SIS 
MCMC 


1,000,000 
1,000,000 
15,000,000 


(4.67 ± .22) x 10 -4 
(3.96 ± .36) x 10~ 4 
(3.56 ± .68) x 10 -4 



The results of Chen et al. are reported in Table 5, alongside our results 
using exact sampling. Our results largely confirm the conclusion of Chen et 
al. — the p-value is small, leading one to reject the null hypothesis (see Fig- 
ure 2). The computation time for our method is very competitive: for this 
dataset, counting the number of matrices takes 0.02 seconds, and exact sam- 
pling takes 0.16 milliseconds /sample, compared to 1.1 milliseconds/sample 
for SIS and 0.07 milliseconds/sample for MCMC. (However, on large matri- 
ces, MCMC and SIS should be significantly faster than our method, due to 
the overhead incurred by counting.) The SIS algorithm of Chen et al. also 
yields an estimate of 6.7150 x 10 16 for the number of matrices. We com- 
pute the exact number of matrices to be 67,149,106,137,567,626, and note 
that this agrees with the exact number reported by Chen et al. (which they 
obtained by other means). 

We should emphasize, however, that in comparison with MCMC and SIS, 
the appeal of our method is not its speed, but rather its exactness. There 
are no guarantees that the estimates coming from MCMC or SIS have ad- 
equately converged. Meanwhile, we are drawing exact i.i.d. samples from 
the null distribution, which provides many guarantees. For example, using 
the 10 6 exact samples above, we obtain an exact 95% confidence interval of 
[4.26,5.12] x 10 for the p-value. (By "exact", we mean that it is a true 
confidence interval, with no approximations involved. Reported confidence 
intervals often involve two approximations: (1) approximate samples used for 
estimation, and/or (2) approximate interval construction based on asymp- 
totics, such as the normal approximation to the binomial.) In a longer run 
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E 

■S 1-5h 



52 

Test statistic 



Fig 2: Histogram of S 2 for 10 6 exact samples from the uniform distribution 
over matrices with margins as in Table 4. The dashed line is the value of S 2 
for Table 4. 



of 10 8 exact samples, we estimate the p-value to be 4.69 x 10 and ob- 
tain an exact 99.9% confidence interval of [4.62,4.77] x 10 -4 . Interestingly, 
this only barely overlaps with the SIS approximate 95% confidence interval, 
[3.25,4.67] x 10" 4 . Clearly, it is advanta geous to use the exact method. 

4.2. Social networks. Another application is hypothesis testing on social 
networks. Although many social network datasets are far too large, some 
are within the range of feasibility for our method. We illustrate with two 
classic datasets. Over the period 1978-1981, Galaskiewicz (1985) recorded 
the membership status of 26 of the CEOs of major corporations in the 
Minneapolis - St. Paul area, with respect to 15 clubs and boards. This 
resulted in a 26 x 15 binary matrix with row sums (3, 3, 2, 3, 3, 3, 4, 3, 4, 

2, 3, 2, 4, 7, 5, 5, 6, 5, 5, 5, 3, 3, 4, 5, 3, 3) and column sums (3, 11, 22, 12, 

3, 4, 4, 4, 6, 3, 4, 5, 5, 3, 9). Since some individuals will tend to belong to 
more clubs and boards (being more sociable or more influential), and some 
clubs and boards will simply be larger than others, the uniform distribution 
on matrices with fixed margins seems to be a reasonable null model. 

As before, our method enables efficient exact results: we count the number 
of matrices to be 25,533,540,876,226,059,861,329,182,058,955,286,218, 
365, 274, 646, 344, 655 « 2.55 x 10 52 in 0.92 seconds, and draw exact samples 
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at a rate of 0.43 milliseconds /sample. As with the ecology matrices, the 
S 2 statistic can be used to test for structure in affiliation matrices such as 
this — in particular, to detect whether individuals exhibit a tendency to 
segregate into cohesive groups. From 10 4 exact samples, we estimate the 
p-value to be 0.125 with an exact 95% confidence interval of [0.118,0.132]. 
Apparently, the data does not strongly indicate such behavior. 

Sometimes, social network data takes the form of a N- valued matrix. 
For instance, Doreian, Batagelj and Ferligoj (2005) analyzed the network of 
marriages among noble families in the Republic of Ragusa during the 16th 
century. This data consists of a 24 x 24 (asymmetric) N-valued matrix with 
entry indicating the number of marriages with husband from family 

i and wife from family j. The row and column sums are (3, 0, 3, 0, 19, 
0, 4, 7, 5, 4, 12, 7, 4, 8, 0, 4, 1, 1, 2, 5, 0, 14, 2, 8) and (0, 1, 4, 2, 20, 
2, 4, 4, 5, 4, 15, 9, 3, 6, 1, 2, 0, 0, 2, 4, 1, 21, 0, 3), respectively. We count 
949, 599, 133, 340, 064, 609, 956, 529, 916, 243, 690, 765, 923, 314, 405, 123, 631, 
501,076,903,661 ~ 9.5 x 10 62 matrices in 36,708 seconds, and sample at a 
rate of 1.9 seconds/sample. (N-valued matrices can take significantly longer 
than binary matrices.) 

4.3. Contingency tables. Two-way contingency tables are a large class 
of N-valued matrices that arise frequently in statistics. Pearson's chi-square 
is the classical test of independence in such tables, however, when the inde- 
pendence hypothesis fails, it can be misleading to interpret the chi-square 
p- value as a measure of the deviation from independence. As a starting point 
for quantifying the departure from independence, Diaconis and Efron (1985) 
propose the conditional volume test (CVT): place the uniform distribution 
over tables with the observed margins, and compute the probability of ob- 
serving a chi-square value less than that of the observed table (in other 
words, compute the p- value under this new null distribution). Larger values 
of the CVT p-value indicate greater deviation from independence. 

Since our method enables exact sampling from the CVT null distribution, 
it can be used to estimate the p- value for the CVT. While many contingency 
tables that arise in practice will have margins that are too large for our 
method, some will fall within the feasible range. 

To illustrate, Table 6(a) shows Francis Galton's (1889) data recording 
the heights (s = short, m = medium, t = tall) of 205 married couples (e.g. 
medium-short, tall-tall, etc.). Table 6(b) has the same margins but different 
entries, and in Table 6(c), every entry is exactly double that of Table 6(b). 
For (a), (b), and (c), Table 7 shows the results of Pearson's chi-square test 
(the approximate p- value) and the conditional volume test (the estimated p- 
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(a) 



t 


12 


20 


18 


m 


25 


51 


28 


s 


9 


28 


14 




s 


m 


t 



Table 6 



(b) 


8 


14 


28 


20 


61 


23 


18 


24 


9 



(c) 


16 


28 


56 


40 


122 


46 


36 


48 


18 



Table 7 

Pearson's chi-square and the conditional volume test of Diaconis and Efron (1985) 





chi-square p- value 


CVT p-value 


CVT 95% CI 


Table 6(a) 


0.57 


0.0011 


[0.0005, 0.0020] 


Table 6(b) 


1.2 x 10" 5 


0.13 


[0.121,0.136] 


Table 6(c) 


1.8 x 10 -11 


0.13 


[0.123,0.137] 



value and exact 95% confidence interval). The chi-square test and the CVT 
both indicate that (a) is close to independence, and that (b) is not close to 
independence. For (c), a naive interpretation of the chi-square p- value would 
be that (c) is far further from independence than (b); meanwhile, the CVT 
indicates that it deviates from independence by roughly the same amount 
as (b), as one would expect. 

These CVT results were each obtained using 10 4 exact samples, drawn at 
a rate of 0.14 seconds/sample for Tables 6(a) and (b) (after 0.3 seconds re- 
quired to count the 1,268,792 tables), and 1.94 seconds/sample for Table 6(c) 
(after 7.7 seconds required to count the 19,151,218 tables). 

We test our method further with two examples from Diaconis and Gangolli 
(1995). The first is an artificial 5x3 table, with margins (10, 62, 13, 11, 39), 
(65,25,45). Our algorithm takes 2.3 seconds to count the 239,382,173 cor- 
responding tables, and 0.3 seconds/sample. Their second example is sig- 
nificantly more challenging: a 4 x 4 table recording the eye color and hair 
color of 592 subjects, with margins (220, 215, 93, 64), (108, 286, 71, 127). Our 
algorithm takes 16,145 seconds to count the 1,225,914,276,768,514 corre- 
sponding tables, and 90 seconds/sample. These counting results match the 
exact numbers reported by Diaconis and Gangolli (1995), obtained by other 
means. 

Our method is not well-suited to small tables with large margins (like 
the last example) , since it exploits the symmetries that arise when there are 
many columns. For small tables, there are significantly faster algorithms for 
counting, such as LattE (De Loera et al., 2004). In particular, these algo- 
rithms can handle extremely large margins (while ours cannot). However, 
such algorithms do not scale well to larger tables. In contrast, our method 



FIXED-MARGIN MATRICES 



17 



can handle somewhat larger tables, as long as the margins are sufficiently 
small — consider, for example, the 100 x 100 example at the end of Section 2. 

4.4. Ehrhart polynomials of the Birkhoff polytope. Let H n (r) = M(p,q) 
with p = q = (r, r, ... , r) G N n . Stanley (1973) proved that for any n G 
{1, 2, . . . }, H n (r) is a polynomial in r. For example, H^(r) is 

H 4 (r) = 1 + (65/18)r + (379/63)r 2 + (35117/5670)r 3 + (43/10)r 4 
+ (1109/540)r 5 + (2/3)r 6 + (19/135)r 7 + (ll/630)r 8 + (ll/11340)r 9 . 

Given H n {l),H n {2), . . . , H n ({ n ~ ls )), one can solve for the coefficients of 
H n {r) (as we describe below). These polynomials have been computed for 
n < 9 by Beck and Pixton (2003). As an application of our method, we 
computed them for n = 4,5, 6, 7, 8, requiring 0.2, 0.3, 0.6, 12.9, 5043 seconds, 
respectively. This is comparable to (or slightly faster than) the computation 
time for Beck and Pixton's algorithm (even though theirs is specialized for 
this purpose). The polynomials we computed matched those reported by 
Beck and Pixton. 

The coefficients of H n (r) can be determined by the following method. 
By Stanley (1973), H n {r) is a polynomial in r such that (a) deg H n {r) = 
(n - l) 2 , (b) H n {r) = for all r G Z such that -n + 1 < r < -1, and (c) 
H n (-n -r) = (-l) n - l H n (r) for all r G N. Let k = ( n ~ 1 ) and d = (n - l) 2 . 
Compute the numbers H n (r) for r = 0,1, ... ,k using Algorithm 3.2, and 
form the vector v := (H n (— n — k+1), . . . , H n (k)) T G Z d+1 using (b) and (c). 
Form the matrix A = (a^ft^ G z( d+1 ) x ( d+1 ) where = (-n - k + , 
and compute u = A -1 v. Then by (a), H n (r) = Ylj=i Ujr^~ l . 

4.5. Total variation distance of approximate methods. Another interest- 
ing application of our exact method is using it to measure the accuracy 
of certain approximate sampling methods. Here, we show how to obtain a 
precise estimate of the total variation distance between the uniform distri- 
bution and an approximation to it, provided that one can sample from the 
approximate distribution and compute the probability of any given matrix 
under it. 

Let P, Q be two probability measures on a finite space X. (We will be 
thinking of X as a set of matrices with given margins, P as the uniform 
distribution, and Q as an approximation.) If R = ^P + and R(x) > 
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for all x € X, then 



d^{P,Q)= l -Y,\ P ^ x )-^ x )\ 



E 



P(x)-Q(x)\ 
P(x) + Q(x) 



R(x) 



P(X!) + Q(Xi) 



1 



n 



\P(Xj) - Q(Xj)\ 
P(Xi) + Q(Xi) 



= E 



n 



: d 



i=l 



where X±, . . . , X n ~ i? i.i.d.. Since < |a — 6|/(a + 6) < 1 for any a, 6 > 
such that a + b > 0, by Hoeffding's inequality we have 



for any e > 0. Therefore, for any a 6 (0, 1), if e = y (l/2ra) log(2/a) then 
[d n — e, d n + e] is a 100(1 — a)% confidence interval for d<rv(P, Q) — that 
is, P(|4 - d TY (P,Q)\ <e)>l-a. 



This makes it possible to precisely measure the accuracy of an approxi- 
mation to the uniform distribution. To illustrate, we assess the Sequential 
Importance Sampling (SIS) approximation of Harrison and Miller (2013). 
Choose P to be the uniform distribution on a set X of binary matrices with 
given margins (p, q), and choose Q to be this approximation. To sample Xi 
from R = |P + T^Q, flip a fair coin — if it is heads then sample from P, 
if it is tails then sample from Q. To get the value of d n , we must compute 
Q(Xi), . . . , Q(X n ), and for this choice of Q, this is provided for by the al- 
gorithm of Harrison and Miller (2013). Since this Q is supported on X, and 
P is uniform on X, then P{X{) = ■■■ = P(X n ) = l/\X\ = l/JV(p, q), which 
we precompute using our counting algorithm. 

For example, for the 100 x 100 matrices with p = q = (5^ 20 \ 4( 20) , 3^ 20 \ 
2(2°), l(2°)), taking n = 10 4 , we estimate d TV (P,Q) to be d n = 0.002 and 
obtain a 99.9% confidence interval of (i n ±0.0195. The approximation appears 
to be quite good in this case. Note that this matrix is sparse, with fairly 
regular margins. 

As a contrasting example, for the 13 x 17 finch matrix from Section 4.1, 
again taking n = 10 4 , we estimate d-jy(P,Q) to be d n = 0.253 and again 
obtain a 99.9% confidence interval of d n ±0.0195. This matrix is not sparse, 
and both margins are highly irregular. 

By assessing an approximation on a variety of examples, one can char- 
acterize when it does well and when it does poorly. This provides a guide 
for selecting an approximation that is well-suited to a given situation, and 
facilitates the design of better approximations. 



H\dn ~ d T y(P,Q)\ >e)< 2exp(-2ne 2 ) 
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5. Proof of recursions. In this section, we prove Theorem 3.1, making 
rigorous the intuitive argument given in Section 3.1. As an alternative to 
the "direct" proof below, in (Miller and Harrison, 2011) we also provide a 
generating function proof that employs some of the beautiful properties of 
symmetric functions, and yields results of a more general nature. 

5.1. Preliminary observations. For r G N n , let r' denote the conjugate of 
r, that is, r\ = #{j : rj > i} for i = 1, 2, 3, . . . . For r, s € N°°, let r As denote 
the component- wise minimum, that is, (ri A S\,r2 A S2, • • •)• In particular, 
r A 1 = (ri A 1, r<i A 1, . . . ). Recall our convention that N™ is considered 
to be the subset of N°° such that all but the first n components are zero. 
(Similarly, we consider Z n C 

Lemma 5.1. Let u, v G N n such that u < v. 

(1) Suppose s G N d for some d G N. Then v — u = v\s if and only if 
s = v' — (v — u)'. 

(2) If s = v' — (v — u)' then J2 s i = Y1 u i an d s < v + Ls. 

(3) // s = v' — (v — u)' and u < v A 1 then s < v. 

Proof. (1) Letting / be the identity operator, a straightforward calcu- 
lation shows that for any d G N, r G Z d , we have {Y2k=o L k ) (I — L)r = r 
and (/ - L) (£fclo Lfc ) r = r, that is, (/ - L)" 1 = J2™ =Q L k on Z d . Fur- 
ther, (I — L) _1 f = r'. Thus, v — u = v\s = v — s + Ls if and only if 
(/ — L)s = v — v — u if and only if s = (/ — L)~ 1 (v — v — u) = v' — (v — u)'. 

(2) If s = v'-(v-u)' then^> = XX-£(v-u)J =E^-£(^-Q = 
Ui since Yl r \ = ^2 r i f° r an r G N n . By (1) , v — s + Ls = v\s = v — u > 

and so s < v + Ls. 

(3) If s = v' — (v — u)' and u < v A 1 then by definition, Si = #{j : Vj > 

■ v j~ u j >i} = #{j : vj = i and Uj = 1} < #{j : Vj = i} = fy. □ 

We follow the usual notational convention that f(A) = {/(a) : a G ^4} 
when A is a subset of the domain of a function /. 

Lemma 5.2. Let v G N n , A; G N, and Zei /(u) = v' - (v - u)' for u G N n 
sitc/i that u < v. 

(1) /(C vA1 (A;)) = C^(fc), and /or any s G C^ife), 
#{uGC7 vA1 (fc):/(u)=s} = 
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(2) f(C v (k)) = {s : s G C+ Ls (k)}, and for any s such that s G C*+ Ls (k), 
#{u G C v (k) : /(u) = s} ^ V + Ls ' 

PROOF. (1) /(C vA1 (/c)) C C^(Jfe) follows from Lemma 5.1(2 and 3). Let 
s G C v (k). Choose u as follows. For i = 1,2,3,..., choose s, of the 
positions j such that Vj = i, and set Uj = 1 for each chosen j. (Set = 
for all remaining j.) This determines some u G C vA1 (k) such that Sj = jj={j : 

= ? and Uj = 1} for all i. Furthermore, it is not hard to see that any 
such u is obtained by such a sequence of choices. Now, as in the proof of 
Lemma 5.1(3), Si = #{j : Vj = i and Uj = 1} if and only if /(u) = s (when 
u < v A 1). Hence, f(C vA1 (k)) D C v (k), and since there were possible 
ways to choose u, then this proves (1). 

(2) f(C v (k)) C {s : s G C*+ Ls (k)} follows from Lemma 5.1(2). Suppose 
s G C* +Ls (k). Let r = v and t = r\s. Note that t > since s < r + Ls. 
Also, r, s, t G N d where al = maxuj. Choose u as follows. First, consider the 
rd positions j in v at which Vj = d. There are (^rj ways to choose td of these 
rd positions. Having made such a choice, we set Uj = Vj — d = for each such 
j that was chosen. Next, consider the r^_i positions j at which Vj = d — 1, 
in addition to the — td remaining positions at which Vj = d. There are 
( rd ~ 1 ~tj r i~ td * > ) wa y s to choose td-i of these. Having made such a choice, we set 
Uj = Vj — (d— 1) for each such j that was chosen. Continuing in this way, for 
i = d — 2, . . . , 1: consider the positions j in v at which Vj = i, in addition 
to the rj+i + • • • + rd — td — ■ ■ ■ — ti + \ remaining positions at which Vj > i, 

choose ti of these (in one of ( % 1+1 ^ ^ d ways), 

and set Uj = Vj — i for each such j that was chosen. After following these 
steps for each i, set Uj = Vj for any remaining positions j. This determines 
some u such that < u < v. 

Now, for i = d,d — 1, . . . , 1, we have chosen ti positions j and we have 
set Uj = Vj — i. That is, ti = #{j : Vj — Uj = i}, and so t = v — u. Hence, 
v — u = v\s (by the definition of t), so s = /(u) by Lemma 5.1(1), and 
additionally, Yl u j = Yl s j = kby Lemma 5.1(2). Thus, we have shown that 
f(C v (k)) D {s : s G C*+ Ls (k)}. 

Using tj = rj — Sj + Sj + ± (the definition of t), we see that there were 

rd\ ( rd-i + (r d - t d )\ (t\ + r 2 H Vr d -t d t 2 

t d ) V td-i J " '\ h 

rd\ ( r d -i + s d \ fn+s 2 \ fr + Ls\ 



SdJ V sd-i J V s% j V s 
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ways to make such a sequence of choices, where the inequality holds since 
s < r+Ls. Hence, there are at least { v+ J jS ) distinct choices of u € C v (k) such 
that /(u) = s. On the other hand, given any u € C v (k) such that /(u) = s, 
we have t = v — u (by Lemma 5.1(1)), thus tj = #{j : Uj = Vj — i}, and 
since Vj > i for any j such that Uj = Vj—i, such a u is obtained by one of the 
sequences of choices above. Hence, #{u € C v (k) : /(u) = s} = ( v+ s Ls )- □ 

5.2. Proof of Theorem 3.1. (1) First, we prove the binary case. Let 
(Pjl) £ ^ m x N n , r = q. Using Lemma 5.2(1), define the surjection 
/ : C qAl (Pl) -»• C r (pi) by /(u) = q' - (q - u)'. Then 

iV(p,r)=iV(p,q) ( = ) N(Lp,q-u) 
ued A1 ( P i) 

= E E ^p>q-u)= E (;Wp>*\«0- 

Step (a) follows from partitioning the set of (p, q) matrices according to the 
first row u £ C qA1 (pi) of the matrix. Step (b) partitions C qAlA pi) into the 
level sets of /, that is, the sets / _1 (s) = {u £ C qA1 (pi) : /(u) = s} as s 
ranges over /(C qA1 (pi)) = C r (pi). Step (c) follows since if /(u) = s then 
q — u = r\s (by Lemma 5.1(1)) and thus N(Lp, q — u) = N(Lp, r\s), and 
since #/ _1 (s) = r) (by Lemma 5.2(1)) . This proves Theorem 3.1(1). 

(2) Now, we consider the N- valued case. Let 5 = {s : s £ C r+Ls (p\)}. 
Using Lemma 5.2(2), define the surjection g : C q (pi) — > S by g(u) = q' — 
(q — u)'. Then, similarly, 

M(p, r) = M(p, q) ( = } M(Lp, q - u) 

uecq(pi) 

= E E M(L P , q -u)^^( r+ s LS )M(L P ,r\s). 

As before, step (a) follows from partitioning the set of matrices according 
to the first row u € C q (pi), step (b) partitions C q (pi) into the level sets 
of g, and step (c) follows since # 5 -i(s) = ( r + Ls ) (by Lemma 5.2(2)). This 
proves Theorem 3.1(2), completing the proof of Theorem 3.1. 
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6. Computation time. Let W(r) := X^fc=i ^ r k = the weight of r G Z n . 

Lemma 6.1 (Properties of the weight). Ifr, s G Z n i/ien 

(1) VF(r + s) = W(r) + W( s ) 

(2) ^(s-Ls) = x:s, 

(3) W(r\s) = W(r) - J> 

(4) Pr(s) = E«i- 

Proof. All four are simple calculations. □ 

For the rest of this section, fix (p, q) G W n x N n such that Yl,Pi = Yl 9i> 
and consider (p, q) to be the margins of a set ofmxn matrices. First, we 
address the time to compute iV(p, q) using Algorithm 3.2, and M(p, q) will 
follow easily. 

Let T)(p, q) denote the set of nontrivial nodes (u, v) in the directed acyclic 
graph (as discussed in Section 3) descending from (p, q) (including (p, q)), 
where nontrivial means (u,v) ^ (0,0). Let Afc(j) := {s G N fc : W(s) = j} 
for j, k G N. The intuitive content of the following lemma is that the graph 
descending from (p, q) is contained in a union of sets Afc(j) with weights 
decreasing by steps of p\ , . . . , p m . 

Lemma 6.2 (Descendants). If tj = ^^jPi and b = max^j then 

D(p,q) C {(u,v) : u = L^p, v G A 6 (i,), j = l,...,m}. 

PROOF. By the form of the recursion, (u,v) G P(p,q) if and only if 
for some 1 < j < m there exist s , . ..,s J in C r (pi), ■ ■ ■ , C r (Pj~i) 
respectively, with r 1 = q, r J+1 = r J \s* for i = 1, . . . ,j — 1, such that (u, v) = 
(L J, ~ 1 p, r?). For j > 2, by Lemma 6.1(3 and 4), 

W(r j ) = W^" 1 ^'" 1 ) = W{r j ~ l ) -pj-x = VF(r J - 2 ) - Pi _ 2 - pj-i 

n J— 1 m j—1 

= ... = ^(r 1 ) - (p x + ... +Pj _ 1 ) = ^qi - ^pi = ^pi - ^pi = tj, 

i=l i=l i=l i=l 

and r J G N 6 by construction, so r J G Ab(tj). Hence, (u, v) = (L- J ~ 1 p,r J ) 
belongs to the set as claimed. □ 

Let T(p, q) be the time (number of machine operations) required by the 
algorithm (Algorithm 3.2) to compute N(p, q) after precomputing all needed 
binomial coefficients. Let r(u, v) be the time to compute N(u, v) given 
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N(Lu, v\s) for all s £ C v {u\). That is, T(p,q) is the time to perform the 
entire recursive computation, whereas r(u, v) is the time to perform a given 
call to the algorithm not including time spent in subcalls. 

Let no := ■ q% > 0} denote the number of nonempty columns. By 
constructing Pascal's triangle, we precompute all of the binomial coefficients 
that may be needed, and store them in a lookup table. We only need binomial 
coefficients with entries less or equal to no, for the following reason. In the 
binary case, the recursion involves numbers of the form with s < v, and 
for any descendent (u,v) and any i > we have Vi < no since the number 
of columns with sum i is less or equal to the total number of nonempty 
columns. For the N-valued case, the same set of binomial coefficients will be 
sufficient, since then we have numbers of the form ( v+ s Ls ) with s < v + Ls, 
and thus 

Vi + s i+ i <Vi + v i+ i + s i+2 < ■■■ <Vi + v i+ i + v i+2 H < n , 

where the last inequality holds because the number of columns j with sum 
greater or equal to i is no more than the total number of nonempty columns. 
Since the addition of two cZ-digit numbers takes Q(d) time, and there are 
( n °2~ 2 ) binomial coefficients with entries less or equal to no, then the bound 
log 10 ({) + 1 < no log 10 2 + 1 on the number of digits for such a binomial 
coefficient shows that this pre-computation can be done in 0(n|]) time. Ex- 
cept in trivial cases (when the largest column sum is 1), the additional time 
needed does not affect the bounds on T(p, q) that we will prove below. 
We now bound the time required for a given call to the algorithm. 

Lemma 6.3 (Time per call). r(u, v) = 0((ab + c)(logc) 3 |C&(iii)[) for 
(u, v) € T>(p, q), where a = max pi, b = max qi, andc = ^2pi. 

Proof. Note that we always have Vi < uq, since the number of columns 
with sum i cannot exceed the number of nonempty columns. Thus, in the 
recursion formula, for each s in the sum corresponding to (u, v), we have 
the bound 

' i=l i=l 

Let T m (k) be the time required to multiply two numbers of magni- 
tude k or less. By the algorithm of Schonhage and Strassen (1971), 
T m (k) = O((log A;) (log log A;) (log log log k)). Therefore, T m (Q) < T m (c a ) = 
0(a(logc) 3 ). Since we have precomputed the binomial coefficients, the time 
required to compute (Z) is thus bounded by 0(a6(log c) 3 ). To finish comput- 
ing the term corresponding to s in the recursion formula, we must multiply 
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Q by iV(Xu,v\s). Since 

m / \ m 

N(Lu, v\s) < iV( P ,q) < J] ( n ° ) < JJng 4 = nl < c c , 

i=l \Pi/ i= i 

then this multiplication can be done in T rn (N(p, q)) < T m (c c ) = 0(c(log c) 3 ) 
time. Since we are summing over C v (u±), and C v {u\) C Cb(ui), then alto- 
gether we have r(u, v) = 0((a6+c)(log c) 3 \Cb(ui)\) for the time per call. □ 

/ j _|_ ^ _ \\ 

Lemma 6.4. #A fc (j) < ( J k _ J j /or any j, A: G N. 

PROOF. The map /(r) = (lri, 2r2, . . . , fcr^) is an injection / : Afc(j) — > 
C k (j). Thus, #A fc (j) < #C k (j) = p+*7 x ). □ 

We are now ready to prove Theorem 3.3. 

Proof of Theorem 3.3 for iV(p,q). By storing intermediate results 
in a lookup table, once we have computed iV(u, v) upon our first visit to 
node (u, v), we can simply reuse the result for later visits. Hence, we need 
only expend r(u, v) time for each node (u, v) occuring in the graph. Let 
tj = YliLjPi an d d= (ab + c)(log c) 3 . Then 

(a) m 

r(p,q)= Yl t(u,v)<Y E ^ Li_1 P.v) 
(u,v)ex>( P ,q) j=iveA b (t 3 ) 

( <EE°( (i i^fe)i) = E ^^^)iiA fe ^)i) 

j v j 



(c) 

< 



E°<% + _r)(\ + -V 



<s;oK a t-i 1 )Ct-T 1 ) ) 

< 0(dm(a + b - l) 6_1 (c + 6 - l) 6 " 1 ), 

where (a) follows by Lemma 6.2, (b) by Lemma 6.3, (c) by Lemma 6.4, and 
(d) since pj < a and tj < c. This proves (1) and (2). Now, (3) and (4) follow 
from (2) since a < c < bn. □ 

Proof of Theorem 3.3 for M(p, q). Other than the coefficients, the 
only difference between the recursion for M (p, q) and that for N(p, q) is that 
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we are summing over s such that s G C r+ (jp\). Lemma 6.2 holds with the 
same proof, except with C r (pi), ... ,C r (Pj-i) replaced by C r +Ls (pi), 
C rJ 1+Lsj 1 (p J _ 1 ), respectively. Considering Lemma 6.3, let (u, v) be 
a descendent of (p,q) in the graph for M(p,q), and let s be such that 
s G C v+Ls (u\). Similarly to before, recalling that Vi + Sj+i < no (as proven 
above in our discussion of precomputing the binomial coefficients), we have 

= n + Si+1 ) < u& + s ^ si ^ «p < < < c a . 

i=i V s */ i 
This yields T m ((*+ Ls )) < T m (c a ) = 0(a(logc) 3 ), just as before. Since 

M(Lu, v\s) < M(p,q) < J] J < ^(2^ = (2c) c , 

then we also obtain T m (M(p,q)) < T m ((2c) c ) = 0(c(logc) 3 ) as before. 
Further, {s : s € C v+Ls (ui)} C Cb(u\), so altogether the time per call is 
0((ab + c)(logc) 3 \Cb(ui)\), and thus the result of Lemma 6.3 continues to 
hold. With this result, the proof of the bounds goes through as well. □ 

This completes the proof of Theorem 3.3. Now, we address the time re- 
quired to uniformly sample a matrix with specified margins. Let T r (k) be the 
maximum over 1 < j < k of the expected time to generate a random integer 
uniformly between 1 and j. If we are given a random bitstream (independent 
and identically distributed Bernoulli(l/2) random variables) with constant 
cost per bit, then T r (k) = O(logfc), since for any j < k, [log 2 j\ < [~log 2 k] 
random bits can be used to generate an integer uniformly between 1 and 
2 r io S2 il anc l then rejection sampling can be used to generate uniform samples 
over {1, . . . , j}. Since the expected value of a Geometric(p) random variable 
is 1 /p, then the expected number of samples required to obtain one that falls 
in {1, . . . , j} is always less than 2. More generally, for any fixed d G N, if we 
can draw uniform samples from {1, . . . ,d}, then we have T r (k) = 0(log k) 
by considering the base-d analogue of the preceding argument. 

Lemma 6.5 (Sampling time). Algorithm 3.4 takes 

0(mT T (n c ) + maT r (n) + m61og(a + b)) 

expected time per sample in the binary case, and 

0{mT r ((2c) c ) + maT r (n) + mblog(a + b)) 

expected time per sample in the N-valued case. IfT r (k) = O(logfc), then this 
is O(mclogc) expected time per sample in both cases. 



v + Ls 
s 



20 



J. W. MILLER AND M. T. HARRISON 



Remark. If b is bounded then O(mclogc) < O(mralogn) since c < bn, 
and so this is polynomial expected time for bounded column sums. 

Proof. By the form of the recursion, the depth of the graph descending 
from (p, q) is equal to the number of rows m, since p € N m and thus 
L m p = 0. For each of the m iterations of the sampling algorithm, we begin 
at some node (u, v), and we must (A) randomly choose a child (Lu, v\s) 
with probability proportional to its count times the number of corresponding 
rows, and then (B) choose a row uniformly from among the possible 
choices in the binary case (or ( v+ s Ls ) in the N- valued case). 

First consider the binary case. To randomly choose a child, consider a 
partition of the integers 1, . . . , N(u, v) with each part corresponding to a 
term in the recursion formula for N(u, v). Generate an integer uniformly at 
random between 1 and iV(u, v), and choose the corresponding child. Gen- 
erating such a random number takes T r (N(u, v)) < T r (iV(p,q)) < T r (rc,Q) 
time. Since there are no more than (XT 1 ) < (a + b - l) 6 " 1 children at 
any step, one can determine which child corresponds to the chosen num- 
ber in 0((6 — 1) log(a + 6 — 1)) time by organizing the children in a binary 
tree. So (A) takes 0(T r (rag) + 61og(a + &)) time. Choosing a row consists 
of uniformly sampling a subset of size Sj from a set of Vi elements, for 
i = 1, Sampling such a subset can be done by sampling without 

replacement Sj times, which takes Ylj*=o ^r(^i ~ j) < SiT r {no) time. So 

(B) can be done in X^i=i s i^r(^o) < a T r { n o) time. Repeating this pro- 
cess m times, once for each row, we see that sampling a matrix takes 
0(mT r (ng) + mblog(a + b) + maT r (rio)) time. If T r (k) = O(logfc), this is 
O(mclogno + m61og(a + b) + ma log no) < O(mclogc) since a, b, uq < c. 

For the N-valued case, the same argument applies, replacing iV(u, v) with 
M(u, v), Uq with (2c) c , and Vi with Vi + Sj+i. □ 



APPENDIX A 

Let p = q= (5(20^4(20^3(20)^(20)^(20)^ where x (k) denotes x repeated 
k times. 

N(p,q) = 235, 147, 658, 512, 972, 346, 136, 720, 844, 732, 752, 867, 234, 
190, 813, 428, 532, 503, 765, 177, 603, 865, 861, 155, 465, 020, 348, 800, 377, 
149, 750, 640, 574, 119, 630, 917, 726, 455, 436, 243, 685, 355, 314, 182, 055, 
905, 496, 772, 141, 379, 315, 815, 531, 771, 443, 489, 007, 534, 912, 882, 316, 
530, 744, 246, 557, 700, 549, 985, 656, 553, 879, 075, 329, 865, 611, 504, 724, 
448, 050, 744, 461, 432, 151, 755, 297, 042, 930, 358, 350, 343, 972, 327, 816, 
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